Data Visualization Tools
Demo of creating maps using Python (folium, Plotly)
How to embed visuals on your website

Python, Anaconda :
Packages used:
Install packages in python : https://packaging.python.org/tutorials/installing-packages/
pandas, geopandas
Data Source:
Online Courses:
## Import required packages
import pandas as pd
import geopandas as gpd
import folium
from folium import plugins
import branca
import time
start_time = time.time()
## Load geojson as geopandas dataframe
dc_zil_gdf = gpd.read_file("zillow-neighborhoods.geojson")
dc_zil_gdf.head()
| city | name | regionid | county | state | geometry | |
|---|---|---|---|---|---|---|
| 0 | Washington | Catholic University | 273159 | District of Columbia | DC | POLYGON ((-77.00433 38.94064, -77.00423 38.940... |
| 1 | Washington | McLean Gardens | 121759 | District of Columbia | DC | POLYGON ((-77.07520 38.93977, -77.07475 38.938... |
| 2 | Washington | Lincoln Heights | 121751 | District of Columbia | DC | POLYGON ((-76.92405 38.89835, -76.92303 38.898... |
| 3 | Washington | Kenilworth | 121743 | District of Columbia | DC | POLYGON ((-76.93406 38.91220, -76.93426 38.911... |
| 4 | Washington | Bellevue | 121674 | District of Columbia | DC | POLYGON ((-77.01639 38.80932, -77.01753 38.808... |
dc_zil_gdf.shape
(137, 6)
dc_crime_2019 = pd.read_csv('Crime_Incidents_in_2019.csv')
dc_crime_2019.columns
Index(['X', 'Y', 'CCN', 'REPORT_DAT', 'SHIFT', 'METHOD', 'OFFENSE', 'BLOCK',
'XBLOCK', 'YBLOCK', 'WARD', 'ANC', 'DISTRICT', 'PSA',
'NEIGHBORHOOD_CLUSTER', 'BLOCK_GROUP', 'CENSUS_TRACT',
'VOTING_PRECINCT', 'LATITUDE', 'LONGITUDE', 'BID', 'START_DATE',
'END_DATE', 'OBJECTID', 'OCTO_RECORD_ID'],
dtype='object')
dc_crime_2019.head()
| X | Y | CCN | REPORT_DAT | SHIFT | METHOD | OFFENSE | BLOCK | XBLOCK | YBLOCK | ... | BLOCK_GROUP | CENSUS_TRACT | VOTING_PRECINCT | LATITUDE | LONGITUDE | BID | START_DATE | END_DATE | OBJECTID | OCTO_RECORD_ID | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | -76.982944 | 38.887599 | 10199597 | 2019-11-07T11:41:36.000Z | DAY | OTHERS | THEFT/OTHER | 1500 - 1599 BLOCK OF INDEPENDENCE AVENUE SE | 401480.0 | 135528.0 | ... | 006801 2 | 6801.0 | Precinct 87 | 38.887592 | -76.982941 | NaN | 2019-11-07T10:36:52.000Z | 2019-11-07T11:42:02.000Z | 429611163 | 10199597-01 |
| 1 | -77.010378 | 38.820469 | 17084415 | 2019-01-28T00:00:00.000Z | MIDNIGHT | GUN | HOMICIDE | 130 - 199 BLOCK OF IRVINGTON STREET SW | 399099.0 | 128076.0 | ... | 010900 2 | 10900.0 | Precinct 126 | 38.820461 | -77.010375 | NaN | 2017-05-19T22:58:53.000Z | 2017-05-20T02:26:45.000Z | 429841378 | 17084415-01 |
| 2 | -76.952665 | 38.920544 | 18208996 | 2019-03-22T16:18:15.000Z | EVENING | OTHERS | THEFT/OTHER | 2400 BLOCK OF MARKET STREET NE | 404105.0 | 139186.0 | ... | 009000 1 | 9000.0 | Precinct 139 | 38.920536 | -76.952663 | NaN | 2018-12-09T17:01:49.000Z | 2018-12-09T18:49:21.000Z | 429890611 | 18208996-01 |
| 3 | -77.027565 | 38.897353 | 18221681 | 2019-01-01T10:24:06.000Z | DAY | OTHERS | THEFT/OTHER | 1100 - 1199 BLOCK OF F STREET NW | 397609.0 | 136611.0 | ... | 005800 1 | 5800.0 | Precinct 129 | 38.897346 | -77.027563 | DOWNTOWN | 2018-12-31T11:49:19.000Z | 2018-12-31T14:43:21.000Z | 429890721 | 18221681-01 |
| 4 | -77.021929 | 38.899129 | 18221708 | 2019-01-01T15:48:01.000Z | EVENING | OTHERS | THEFT/OTHER | 700 - 799 BLOCK OF 7TH STREET NW | 398098.0 | 136808.0 | ... | 005800 1 | 5800.0 | Precinct 129 | 38.899121 | -77.021926 | DOWNTOWN | 2018-12-31T12:48:46.000Z | 2018-12-31T12:51:47.000Z | 429890728 | 18221708-01 |
5 rows × 25 columns
dc_crime_2019['NEIGHBORHOOD_CLUSTER']
0 Cluster 26
1 Cluster 39
2 Cluster 24
3 Cluster 8
4 Cluster 8
...
33905 Cluster 25
33906 Cluster 33
33907 Cluster 33
33908 Cluster 25
33909 Cluster 25
Name: NEIGHBORHOOD_CLUSTER, Length: 33910, dtype: object
dc_zil_gdf.columns
Index(['city', 'name', 'regionid', 'county', 'state', 'geometry'], dtype='object')
import pyproj
from shapely.geometry import shape, Point
from shapely.ops import transform
from functools import partial
dc_crime_2019['neighborhood'] = ""
long = dc_crime_2019.columns.get_loc('LONGITUDE')
lat = dc_crime_2019.columns.get_loc('LATITUDE')
geometry = dc_zil_gdf.columns.get_loc('geometry')
name = dc_zil_gdf.columns.get_loc('name')
## use shapely to check if lat/lon is within the zillow neighborhood shape
for i in range(len(dc_crime_2019)):
point = Point(dc_crime_2019.iloc[i,long],dc_crime_2019.iloc[i,lat]) ## Longitude, Latitude
for j in range(len(dc_zil_gdf)):
polygon = shape(dc_zil_gdf.iloc[j,geometry])
if polygon.contains(point):
dc_crime_2019.iloc[i, dc_crime_2019.columns.get_loc('neighborhood')] = dc_zil_gdf.iloc[j,name]
dc_crime_2019.to_csv("dc_crime_2019_final.csv", index = False) ## write the data so we don't have to re-run this every time
dc_crime_2019.head()
| X | Y | CCN | REPORT_DAT | SHIFT | METHOD | OFFENSE | BLOCK | XBLOCK | YBLOCK | ... | CENSUS_TRACT | VOTING_PRECINCT | LATITUDE | LONGITUDE | BID | START_DATE | END_DATE | OBJECTID | OCTO_RECORD_ID | neighborhood | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | -76.982944 | 38.887599 | 10199597 | 2019-11-07T11:41:36.000Z | DAY | OTHERS | THEFT/OTHER | 1500 - 1599 BLOCK OF INDEPENDENCE AVENUE SE | 401480.0 | 135528.0 | ... | 6801.0 | Precinct 87 | 38.887592 | -76.982941 | NaN | 2019-11-07T10:36:52.000Z | 2019-11-07T11:42:02.000Z | 429611163 | 10199597-01 | Kingman Park |
| 1 | -77.010378 | 38.820469 | 17084415 | 2019-01-28T00:00:00.000Z | MIDNIGHT | GUN | HOMICIDE | 130 - 199 BLOCK OF IRVINGTON STREET SW | 399099.0 | 128076.0 | ... | 10900.0 | Precinct 126 | 38.820461 | -77.010375 | NaN | 2017-05-19T22:58:53.000Z | 2017-05-20T02:26:45.000Z | 429841378 | 17084415-01 | Bellevue |
| 2 | -76.952665 | 38.920544 | 18208996 | 2019-03-22T16:18:15.000Z | EVENING | OTHERS | THEFT/OTHER | 2400 BLOCK OF MARKET STREET NE | 404105.0 | 139186.0 | ... | 9000.0 | Precinct 139 | 38.920536 | -76.952663 | NaN | 2018-12-09T17:01:49.000Z | 2018-12-09T18:49:21.000Z | 429890611 | 18208996-01 | Fort Lincoln |
| 3 | -77.027565 | 38.897353 | 18221681 | 2019-01-01T10:24:06.000Z | DAY | OTHERS | THEFT/OTHER | 1100 - 1199 BLOCK OF F STREET NW | 397609.0 | 136611.0 | ... | 5800.0 | Precinct 129 | 38.897346 | -77.027563 | DOWNTOWN | 2018-12-31T11:49:19.000Z | 2018-12-31T14:43:21.000Z | 429890721 | 18221681-01 | Penn Quarter |
| 4 | -77.021929 | 38.899129 | 18221708 | 2019-01-01T15:48:01.000Z | EVENING | OTHERS | THEFT/OTHER | 700 - 799 BLOCK OF 7TH STREET NW | 398098.0 | 136808.0 | ... | 5800.0 | Precinct 129 | 38.899121 | -77.021926 | DOWNTOWN | 2018-12-31T12:48:46.000Z | 2018-12-31T12:51:47.000Z | 429890728 | 18221708-01 | Chinatown |
5 rows × 26 columns
dc_crime_2019 = pd.read_csv("dc_crime_2019_final.csv")
nhood_incidents_all = dc_crime_2019.neighborhood.value_counts()
nhood_map = dc_zil_gdf.merge(nhood_incidents_all.to_frame('Incidents_All'), left_on = 'name',right_index = True)
nhood_map.head()
| city | name | regionid | county | state | geometry | Incidents_All | |
|---|---|---|---|---|---|---|---|
| 0 | Washington | Catholic University | 273159 | District of Columbia | DC | POLYGON ((-77.00433 38.94064, -77.00423 38.940... | 130 |
| 1 | Washington | McLean Gardens | 121759 | District of Columbia | DC | POLYGON ((-77.07520 38.93977, -77.07475 38.938... | 19 |
| 2 | Washington | Lincoln Heights | 121751 | District of Columbia | DC | POLYGON ((-76.92405 38.89835, -76.92303 38.898... | 66 |
| 3 | Washington | Kenilworth | 121743 | District of Columbia | DC | POLYGON ((-76.93406 38.91220, -76.93426 38.911... | 47 |
| 4 | Washington | Bellevue | 121674 | District of Columbia | DC | POLYGON ((-77.01639 38.80932, -77.01753 38.808... | 206 |
nhood_map['Incidents_All'].max()
import matplotlib.pyplot as plt
plt.hist(nhood_map['Incidents_All'], bins=10)
plt.show()
max(nhood_map['Incidents_All'])
1892
## identifies the center point of all the neighborhood shapes
centroid=dc_zil_gdf.geometry.centroid
## initiaes a map based on the centroid
m=folium.Map(location=[centroid.y.mean(), centroid.x.mean()], zoom_start=12)
m
nhood_map['QP'] = nhood_map['Incidents_All'] / nhood_map['Incidents_All'].sum()
nhood_map['QP_str'] = nhood_map['QP'].apply(lambda x : str(round(x*100, 1)) + '%')
name = "DC Crime Map"
leg_brks = [0, 50.0, 150.0, 250.0,500,750.0, 1000.0, 1892.0]
colorscale = branca.colormap.linear.YlOrRd_09.scale(nhood_map['Incidents_All'].min(), nhood_map['Incidents_All'].max())
colorscale = colorscale.to_step(n = 7, quantiles = leg_brks) ## sets quantile breaks
colorscale.caption = name ## adds name for legend
colorscale
centroid=dc_zil_gdf.geometry.centroid
## initiaes a map based on the centroid
m=folium.Map(location=[centroid.y.mean(), centroid.x.mean()], tiles="Stamen Toner", zoom_start=12)
m
# nhood_map['QP'] = nhood_map['Incidents_All'] / nhood_map['Incidents_All'].sum()
# nhood_map['QP_str'] = nhood_map['QP'].apply(lambda x : str(round(x*100, 1)) + '%')
# from branca.colormap import linear
# nbh_count_colormap = linear.YlGnBu_09.scale(min(nhood_map['Incidents_All']),
# max(nhood_map['Incidents_All']))
## identifies the center point of all the neighborhood shapes
centroid=dc_zil_gdf.geometry.centroid
## initiaes a map based on the centroid
m=folium.Map(location=[centroid.y.mean(), centroid.x.mean()], tiles="Stamen Toner", zoom_start=12)
style_function = lambda x: {"weight":1
, 'color': '#545453'
## if variable is 0 map is a very light grey
## else colorscale applies based on variable
, 'fillColor':'#9B9B9B' if x['properties']['Incidents_All'] == 0
else colorscale(x['properties']['Incidents_All'])
## similarly opacity is increased if value is 0
, 'fillOpacity': 0.2 if x['properties']['Incidents_All'] == 0
else 0.7}
folium.GeoJson(
nhood_map,
style_function=style_function,
tooltip=folium.GeoJsonTooltip(
fields=['name', 'Incidents_All', 'QP_str'],
aliases=['Neighbourhood', 'Incidents amount', 'Quote-part'],
localize=True
)
).add_to(m)
colorscale.add_to(m)
colorscale.caption = 'DC Crime Map 2019'
colorscale.add_to(m)
m
nhood_map['QP'] = nhood_map['Incidents_All'] / nhood_map['Incidents_All'].sum()
nhood_map['QP_str'] = nhood_map['QP'].apply(lambda x : str(round(x*100, 1)) + '%')
from branca.colormap import linear
nbh_count_colormap = linear.YlGnBu_09.scale(min(nhood_map['Incidents_All']),
max(nhood_map['Incidents_All']))
nbh_count_colormap
## identifies the center point of all the neighborhood shapes
centroid=dc_zil_gdf.geometry.centroid
## initiaes a map based on the centroid
m=folium.Map(location=[centroid.y.mean(), centroid.x.mean()], tiles="Stamen Toner", zoom_start=12)
style_function = lambda x: {
'fillColor': nbh_count_colormap(x['properties']['Incidents_All']),
'color': 'black',
'weight': 1.5,
'fillOpacity': 0.7
}
folium.GeoJson(
nhood_map,
style_function=style_function,
tooltip=folium.GeoJsonTooltip(
fields=['name', 'Incidents_All', 'QP_str'],
aliases=['Neighbourhood', 'Incidents amount', 'Quote-part'],
localize=True
)
).add_to(m)
nbh_count_colormap.add_to(m)
nbh_count_colormap.caption = 'DC Crime Map 2019'
nbh_count_colormap.add_to(m)
m
dc_crime_2019.head()
| X | Y | CCN | REPORT_DAT | SHIFT | METHOD | OFFENSE | BLOCK | XBLOCK | YBLOCK | ... | CENSUS_TRACT | VOTING_PRECINCT | LATITUDE | LONGITUDE | BID | START_DATE | END_DATE | OBJECTID | OCTO_RECORD_ID | neighborhood | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | -76.982944 | 38.887599 | 10199597 | 2019-11-07T11:41:36.000Z | DAY | OTHERS | THEFT/OTHER | 1500 - 1599 BLOCK OF INDEPENDENCE AVENUE SE | 401480.0 | 135528.0 | ... | 6801.0 | Precinct 87 | 38.887592 | -76.982941 | NaN | 2019-11-07T10:36:52.000Z | 2019-11-07T11:42:02.000Z | 429611163 | 10199597-01 | Kingman Park |
| 1 | -77.010378 | 38.820469 | 17084415 | 2019-01-28T00:00:00.000Z | MIDNIGHT | GUN | HOMICIDE | 130 - 199 BLOCK OF IRVINGTON STREET SW | 399099.0 | 128076.0 | ... | 10900.0 | Precinct 126 | 38.820461 | -77.010375 | NaN | 2017-05-19T22:58:53.000Z | 2017-05-20T02:26:45.000Z | 429841378 | 17084415-01 | Bellevue |
| 2 | -76.952665 | 38.920544 | 18208996 | 2019-03-22T16:18:15.000Z | EVENING | OTHERS | THEFT/OTHER | 2400 BLOCK OF MARKET STREET NE | 404105.0 | 139186.0 | ... | 9000.0 | Precinct 139 | 38.920536 | -76.952663 | NaN | 2018-12-09T17:01:49.000Z | 2018-12-09T18:49:21.000Z | 429890611 | 18208996-01 | Fort Lincoln |
| 3 | -77.027565 | 38.897353 | 18221681 | 2019-01-01T10:24:06.000Z | DAY | OTHERS | THEFT/OTHER | 1100 - 1199 BLOCK OF F STREET NW | 397609.0 | 136611.0 | ... | 5800.0 | Precinct 129 | 38.897346 | -77.027563 | DOWNTOWN | 2018-12-31T11:49:19.000Z | 2018-12-31T14:43:21.000Z | 429890721 | 18221681-01 | Penn Quarter |
| 4 | -77.021929 | 38.899129 | 18221708 | 2019-01-01T15:48:01.000Z | EVENING | OTHERS | THEFT/OTHER | 700 - 799 BLOCK OF 7TH STREET NW | 398098.0 | 136808.0 | ... | 5800.0 | Precinct 129 | 38.899121 | -77.021926 | DOWNTOWN | 2018-12-31T12:48:46.000Z | 2018-12-31T12:51:47.000Z | 429890728 | 18221708-01 | Chinatown |
5 rows × 26 columns
ave_lat = sum(dc_crime_2019.Y)/len(dc_crime_2019.Y)
ave_long = sum(dc_crime_2019.X)/len(dc_crime_2019.X)
ave_lat
38.908362901861594
import plotly
import plotly.graph_objs as go
from plotly.tools import make_subplots
# Generate an access token for this project
mapbox_access_token = 'pk.eyJ1Ijoid2FuZzY1MDYiLCJhIjoiY2tiNGtra2ozMHVoYjJ3bzlsMThtenNyOCJ9.jPlsp6JCn_Vu_GzykjHtnw'
my_style = "mapbox://styles/wang6506/ckb4kl37t0q271jmpxc8akwsg"
trace = go.Scattermapbox(
lat = dc_crime_2019['Y'],
lon = dc_crime_2019['X'],
marker = go.scattermapbox.Marker(size = 5,opacity = 0.7),
text = dc_crime_2019[['BLOCK','OFFENSE']]
)
layout = go.Layout(
title = 'DC Crime Visual',
width = 1000, height = 1000,
mapbox = go.layout.Mapbox(
accesstoken = mapbox_access_token,
bearing = -50,
pitch = 50,
zoom = 12,
center = go.layout.mapbox.Center(lat=ave_lat,lon=ave_long),
style = my_style
),
)
fig = go.Figure(data = trace, layout = layout)
plotly.offline.iplot(fig)